Neural Style Transfer

What it can do?

How?

Given image_style, image_content.
Define distance functions: style_distance(img1, img2), content_distance(img1, img2)

content_distance: Euclidean distance between the intermediate representations of 2 images.
style_distance: A little bit complicated. see raw code:

def gram_matrix(input_tensor):
  # We make the image channels first 
  channels = int(input_tensor.shape[-1])
  a = tf.reshape(input_tensor, [-1, channels])
  n = tf.shape(a)[0]
  # gram is a channels x channels matrix
  gram = tf.matmul(a, a, transpose_a=True)
  return gram / tf.cast(n, tf.float32)
 
def get_style_loss(base_style, gram_target):
  """Expects two images of dimension h, w, c"""
  # height, width, num filters of each layer
  height, width, channels = base_style.get_shape().as_list()
  gram_style = gram_matrix(base_style)
  
  return tf.reduce_mean(tf.square(gram_style - gram_target))

gram means correlation between different channels in intermediate layers.

New explanation:

我们从style transfer中使用的Gram matrix出发，试图解释为什么Gram matrix可以代表一个图片的style这个问题。这是我看完style transfer的paper后感觉最为迷惑的一点。一个偶然的机会，我们发现这个匹配两张图的Gram matrix，其实数学上严格等价于极小化这两张图deep activation的2nd poly kernel的MMD距离。其中，MMD距离是用来通过从两个分布中sample的样本来衡量两个分布之间的差异的一种度量。所以本质上，style transfer这个paper做的事情就是将生成图片的deep activation分布和style image的分布进行匹配。这其实可以认为是一个domain adaptation的问题。所以很自然我们可以使用类似于adaBN的想法去做这件事情。这后续有一系列的工作拓展了这个想法，包括adaIN[3]以及若干基于GAN去做style transfer的工作。

Li, Yanghao, Naiyan Wang, Jiaying Liu, and Xiaodi Hou. “Demystifying neural style transfer.” arXiv preprint arXiv:1701.01036 (2017)